Skip to content

Fix proxy hang when target is unreachable during client-streaming RPCs#618

Merged
sfc-gh-ikryvanos merged 4 commits into
mainfrom
ikryvanos/fix-file-cp-stall
Apr 27, 2026
Merged

Fix proxy hang when target is unreachable during client-streaming RPCs#618
sfc-gh-ikryvanos merged 4 commits into
mainfrom
ikryvanos/fix-file-cp-stall

Conversation

@sfc-gh-ikryvanos
Copy link
Copy Markdown
Collaborator

@sfc-gh-ikryvanos sfc-gh-ikryvanos commented Apr 10, 2026

What changed (user-visible)

  • Unreachable target: sanssh file cp through the proxy to a target that doesn't exist (or is unreachable) now fails with a clear error instead of hanging indefinitely.
  • Target dies mid-upload: if the target server crashes or is killed during a file upload, the client now reports an error and exits with a non-zero code instead of silently reporting 100% success.
  • Progress bar on error: the progress bar is cleared when an upload fails, so users no longer see a confusing "100%" line followed by an error message.

Why it was broken

TargetStream.Send() — the proxy method that enqueues data for forwarding to the target — only checked the gRPC stream's context for cancellation. Two problems:

  1. Before connection: the gRPC stream context belonged to a placeholder (unconnectedClientStream) and was never cancelled when NewStream failed. Send() blocked forever on the full reqChan.

  2. After connection: when the target died mid-stream, grpcStream.SendMsg returned io.EOF but the send loop exited without cancelling anything. Send() kept accepting data into the dead reqChan because the buffer had room (consumer gone) and the context wasn't cancelled yet.

What was done

proxy/server/target.go:

  • Send() now selects on both s.ctx (our own context, cancelled by cancelFunc) and getStream().Context() (gRPC transport-level cancellation). Both are needed because after setStream() they are different objects.
  • Run() calls s.cancelFunc() when NewStream fails (was missing).
  • Run() calls s.cancelFunc() when SendMsg returns io.EOF (target died mid-stream), so Send() fails immediately on the next call instead of silently enqueueing data.

services/localfile/client/client.go:

  • Progress bar is cleared on error paths so it doesn't render 100% after a failure.

proxy/server/target_test.go:

  • TestSendUnblocksWhenTargetUnreachable — verifies Send unblocks when the target is unreachable.
  • TestSendFailsAfterTargetDiesMidStream — verifies Send returns an error after the target dies mid-stream.

@sfc-gh-ikryvanos sfc-gh-ikryvanos marked this pull request as draft April 10, 2026 15:39
@sfc-gh-ikryvanos sfc-gh-ikryvanos force-pushed the ikryvanos/fix-file-cp-stall branch 2 times, most recently from 617f88f to 8277b25 Compare April 24, 2026 11:58
When NewStream to the target fails (e.g. unreachable host), the
TargetStream context was not cancelled. Any in-flight Send() calls
blocked forever on the full reqChan, because the cancellation context
they select on was never signalled.

Add s.cancelFunc() on NewStream failure, matching the existing
DialContext failure path. This unblocks Send() and lets the error
propagate back to the client.

Made-with: Cursor
@sfc-gh-ikryvanos sfc-gh-ikryvanos force-pushed the ikryvanos/fix-file-cp-stall branch from 8277b25 to dec8f4b Compare April 24, 2026 12:12
When grpcStream.SendMsg returns io.EOF (target died), cancel the stream
context immediately so TargetStream.Send fails on the next call instead
of silently enqueueing data into a dead channel. Also switch Send() to
select on s.ctx (the stream's own context) rather than
getStream().Context() so it sees the cancellation after setStream()
has replaced the placeholder stream.

Clear the progress bar on error paths in file cp to avoid showing a
confusing 100% after an error message.

Made-with: Cursor
@sfc-gh-ikryvanos sfc-gh-ikryvanos force-pushed the ikryvanos/fix-file-cp-stall branch from c033bce to a1225b7 Compare April 24, 2026 12:59
Belt-and-suspenders: s.ctx catches our explicit cancelFunc calls,
getStream().Context() catches gRPC-level transport cancellations.

Made-with: Cursor
@sfc-gh-ikryvanos sfc-gh-ikryvanos force-pushed the ikryvanos/fix-file-cp-stall branch from a1225b7 to 2e7a702 Compare April 24, 2026 13:01
@sfc-gh-ikryvanos sfc-gh-ikryvanos marked this pull request as ready for review April 24, 2026 13:24
Copy link
Copy Markdown
Collaborator

@sfc-gh-jdubiel sfc-gh-jdubiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sfc-gh-ikryvanos sfc-gh-ikryvanos merged commit 777e850 into main Apr 27, 2026
6 checks passed
@sfc-gh-ikryvanos sfc-gh-ikryvanos deleted the ikryvanos/fix-file-cp-stall branch April 27, 2026 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants